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INTRODUCTION 



If you asked scientists what qualities make a good 
scientist they might come up with the following 
list* the ability to explain ideas and procedures in 
written and oral form, to formulate and test 
hypotheses, to work with colleagues in a productive 
manner, to ask penetrating questions and make helpful 
comments when you listen, to choose interesting 
problems to work on, to design good experiments,and 
to have a deep understanding of theories and questions 
in your field. Excellence in other school subjects, such 
as math, English, and history, requires similar abilities. 

If you think about how to assess such an array of 
different knowledge and abilities, it is clear that paper 
and pencil cannot in any direct way assess most such 
abilities. And yet our entire testing system is almost 
completely reliant on paper and pencil. It is really as 
questionable as trying to judge a gymnast's or 
musiaan'sabilitywithpaper-and-pencil testing. Paper 
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and pencil can only measure a small part of math- 
ematical, scientific, and language ability. 

And yet everyone agrees that tests have a large 
effect on what is taught Administrators, teachers, and 
students willemphasizetnoseabilities necssary todo 
well on tests, and the pressures to do so are becoming 
more intense. If the testing system only taps a small 
part of what it means to know and do science or math 
o»" English or history, then testing will drive the system 
to emphasize a small range of those abilities. We 
would argue thatitinfacthasdonejustthatln science, 
the paper-and-pencil testing system has driven edu- 
cation to emphasize just two abilities*: recall of facts 
and concepts, and ability to solve short, well-defined 
problems. These two abilities do not, in any sense, 
represent the range of abilities required to be a good 
scientist 

We would argue that it is proper for assessment to 
drive the education system. People need goals as to 
what they should be learning/ and tests encapsulate 
abstract learning goals in a concrete form that every- 
one can understand. But there is a huge disparity 
between the goals realized in the current paper-and- 
pencil tests and the authentic goals of education we 
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should be pursuing as a society: to teach people how 
to learn and think like scientists, writers, bookkeepers, 
technicians, etc In our view, education should pursue 
the goals of being a thoughtful citizen who can meet 
the changing demands cf society (Collins, in press; 
Zuboff, 1%8). 

Our thesis is that paper and pencil, video, and 
computers give three very different views of what 
students can do. It is like three different camera angles 
on the complete picture of a student Whereas you 
cannot possibly reconstruct the total person from just 
one angle, with three different views you can trian- 
gulate to get a much richer notion of what a student's 
abilities are. By enriching the way we assess students, 
we will enrich the way we educate them. 

STORIES ABOUT TRADITIONAL 
TEACHING AND TESTING 

There are several stories we like to tell to em- 
phasize why we need substantial restructur- 
ing of the way assessment is done in schools. 
The first comes from Alan Schoenfeld (in press) 
who observed a geometry teacher in the Rochester 
New York schools who was reputed to be one of the 
best teachers in the state because his students did so 
well on the Regents exam in geometry. It turned out 
that he had his students memorize the twelve proofs 
that might be on the Regents exam, which is a com- 
plete perversion of the goal of learning geometry. A 
similar tale comes from Jerry Pines, whose son took an 
AP English course in which the students never wrote 
more than a one-page paper because that is the length 
of wri ting required for the AP exam. 

Another story comes from Sig Abeles who, with 
Joan Baron, administered a test statewide in Connecti- 
cut at the eighth and twelfth grade levels on density 
(which is taught in the eighth grade). Students did 
quite well on a multiple-choice test item, where they 
were given the weight and volume and asked to figure 
out the density. But when they were given a block of 
wood, a ruler, and a scale, only about 3% of the eighth 
graders and 12% of the twelfth graders could solve the 
problem. Simply stated, students often learn to give 
back answers to written items that they have no ability 
to apply in real life. 

The final story comes from Norman Frederiksen 
(1984), who during World War II was assigned to 



improve testing procedures for the job of gunner's 
mate in the Navy. This is a job that requires cleaning 
and maintaining guns on board ships, but he found 
that the teaching was by lecture and the testing was by 
paper and pencil. He proposed a performance test; 
based on the tasks that gunner's mates actually carry 
out But the instructors objected to this because they 
thought the students would fail. And they did. Sub- 
sequently, teaching practice changed in the courses, so 
that fairly soon students learned to do just as well on 
performance tests as they had previously done on 
pencil-and-paper tests. A similar change is reported to 
have occurred when performance testing was intro- 
duced into the elementary school science curriculum 
in New York State. If ve change the way we test 
students, it really does affect what is taught. 

A SYSTEMS APPROACH 
TO ASSESSMENT 

tJB U e have argued elsewhere (Frederiksen & 
mkM CoUins,1989)thatifwearegoingtohave 
^■■to systemically valid tests (i.e., tests that 
WM foster the learning of the knowledge and 
skills that the test is designed to measure), then the 
tests must meet four criteria: 

1. Directness refers to the degrt? that the test 
specifically measures the knowledge and skill we 
want students to achieve, as opposed to measuring 
indicator variables for that knowledge and skill. Often 
directness is sacrificed for the sake of "objectivity." 

2. Scope refers to the degree to which all of the 
knowledge and skill required are assessed. If part is 
omitted, teachers and students will misdirect their 
teaching and learning in order to maximize scores on 
tests. 

3. ReliabiUryreferstothedegreetowhichdiffer- 
ent judges assign the same score to an assessment It is 
critical to achieve fairness in any assessment. 

4. Transparency refers to the ability of those be- 
ing assessed to understand the criteria on which they 
are being judged. If they are to improve their perfor- 
mance, the assessment must be transparent. 

We would argue that if school assessment is going 
to meet the criteria of directness and scope, assessment 
must go beyond pencil-and-paper testing. Video and 
computer technologies provide very different media 
for recording student performances, and make it pos- 
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sible to construct assessments that more fairly repre- 
sent the range of knowledge and skills toward which 
education should be directed. 

Frederiksen and Collins (1989) also developed a 
set of principles for the design of rystemically valid 
tests* Here we will briefly describe the components of 
such a testing system and the methods by which the 
system encourages learning. The components of the 
system are: 

Set of tasks. The tasks should be authentic, eco- 
logically valid tasks that are representative of the 
kinds of knowledge and skills expected of the students 
(Brown, Collins, & Duguid, 1989; Wiggins, 1989). 

Criteria for each task and aspect of expertise. 
Performance on a task (or aspect of a task) should be 
evaluated in terms of a small number of criteria that 
the students understand. The criteria should be small 
in number so that students can focus on them, they 
should be learnable so that student efforts lead to 
improvement, and they should cover all aspects re- 
quired for good performance in the task. 

A library of exemplars. To insure reliability of 
scores and learnabiiity, there needs to be a library of 
records of student performances. These exemplars 
should include critiques by master assessors in terms 
of the criteria. They should be available to everyone, 
particularly the testees* 

A training system for scoring tests. There are three 
groups who must learn to reliably > jsess test perfor- 
mance: (a) master assessors, (b) coaches, who for 
students would be teachers, and (c) the testees. Master 
assessors are charged with maintaining standards, 
and must train teachers to coach students as to how to 
perform well. 

The methods for fostering improvement on the 
test include: 

Practice in self -assessment Students should have 
practice evaluating their test performance, which is 
possible using recording technologies such as video or 
computers (Collins k Brown, 1988). 

Repeated testing. Students should have opportu- 
nities to take the test multiple times so they can strive 
to improve their scores. 

Feedback on test performance. When students 
take the test, there should be a review of their per- 
formance with a master assessor or coach to help them 
see how their performance might be improved. 



Multiple levels of success. There should be vari- 
ous landmarks of success, so that students can strive to 
do better. 

This briefly summarizes the design principles we 
proposed. They are elaborated in the Frederiksen and 
Collins (1989) paper. 

THE ROLES OF DIFFERENT MEDIA 
IN ASSESSMENT 

The three media— pencil and paper, comput- 
ers, and video— provide three different views 
of students. Our goal in this section h to delin- 
eate some of the different abilities that each 
medium can tap in order to emphasize how to con- 
struct ? broader view of students. 

Tiie strength of ihe computer is its ability to track 
the process of learning and thinking and to interact 
with students. This gives it a variety of ways to tap into 
aspects of students' abilities that the other media 
cannot: 

1. Computers can record how students learn 
with feedback. Because it is possible to put students 
into novel learning environments where the feedback 
is systematically controlled by the computer, it is 
possible to assess how well or how fast different 
students learn in such environments (Collins, 1990a). 
This can provide a measure, not just of current perfor- 
mance levels, but of learning ability in a particular 
domain. 

2. Computers can record students' thinking. 
Because computers can trace the process by which 
students maneuver through a problem or task, they 
can record various aspects of students' strategic pro- 
cesses (Collins, 1990a; Frederiksen & White, 1990). For 
example, it is possible to keep records of whether 
students systematically control variables when test- 
ing a hypothesis. It is also possible to look at their 
control or metacognitive strategies (Collins & Brown, 
1988; Schoenfeld, 1985) to determine what they do 
when they are stuck, how long they pursue dead ends, 
etc. In summary, the ability to trace the problem- 
solving process gives computers a way to measure the 
strategy aspects of their knowledge. 

3. Computers can record students' abilities to 
deal with realistic situations. Because computers can 
simulate real-world situations, like running a bank or 
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repairing broken equipment (Collins, 1990b), it is pos- 
sible to measure students' abilities in understanding 
situations, integrating information from different 
sources, and reacting appropriately in real time. Paper 
and pencil and video really cannot simulate real situ- 
ations, so only computers give us a view of people's 
practical intelligence; that is, their ability to deal with 
realistic situations. 

Video provides a very different view of students' 
abilities because it can record their ongoing activities 
and explanations in rich detail. This makes it possible 
to evaluate other abilities: 

1. Video can record how students explain ideas 
and answer questions that challenge their under- 
standing. Oral presentation is critical to many aspects 
of life, and video enables us to capture student presen- 
tations in the same way we capture written presenta- 
tions with paper and pencil. With video we can see 
how well students integrate words and diagrams as 
they explain things. It is also possible to see how they 
answer challenging questions that their audience poses, 
how they deal with counterexamples and counter- 
arguments, and how they clarify points that areundear 
to the audience. 

2. Video can record how well a student listens. 
Because video is a richly detailed medium, it is pos- 
sible to see how students listen to other students or 
adults, how well they ask questions, and critique or 
summarize what is said. Listening requires a variety 
of critical skills: communicating to the speaker what 
you don't understand, directing their discussion to 
theissuesthatareparticularlyimportantorrelevantto 
yourneeds, elaboratingor synthesizing their remarks. 
Video is the only medium that enables us to evaluate 
their listening ability. 

3. Video can record how well students cooper- 
ate in a joint task. Because video can record students' 
interactions, it can be used to measure how well they 
work with their partners, offer constructive comments, 
and monitor their partners' understanding. The skills 
of cooperating are critical to almost every aspect of 
life, and yet they are discouraged in most current 
school practice. 

4. Videocanrecordhowstudents carry out tasks 
and perform experiments. Because video can record 
students carrying out actions, it makes it possible to 
evaluate their ability to perform science experiments, 
use tools, follow instructions, or create new objects. 
That is to say, video gives us the ability to see how 



students are integrating their eyes, hands, voices, and 
minds. 

Paper and pencil can provide a much broader 
view of students than is currently employed in most 
testing. The major uses of pencil and paper in current 
testing are to measure students' knowledge of facts, 
concepts, and procedures, their ability to solve prob- 
lems, and their ability to comprehend text Two addi- 
tional ways that paper and pencil might profitably be 
used are: 

1. Paper and pencil can record how students 
compose texts and documents of different kinds. 
Paper and pencil are sometimes used to evaluate how 
well students can write a persuasive essay, a clear 
explanation, or an interesting story, but it also should 
be used to evaluate students' reports, memos, letters, 
and even graphs, drawings, or musical scores. Much 
more sophisticated multimedia documents can be 
produced with computer tools, which may come to 
replace pencil and paper for document creation. 

2. Paper and pencil can record how students 
critique different documents or performances. For 
example, students can be asked to critique the meth- 
odology of an experiment or the logic of an argument 
They might be asked to review a play, concert, book, 
or dance performance. Students' critical abilities are 
rarely evaluated in current testing. 

In this section we have tried to give an idea of the 
wide range of student abilities that are rarely, if ever, 
evaluated, and which the different media give us a 
means to document Our argument is that current 
testing gives us a very narrow view of students, and 
this narrowness fundamentally misdirects all of edu- 
cation. It is critical that we extend the scope of testing 
to represent much more broadly the range of abilities 
necessary to being an educated person. 

Many of the kinds of records proposed require 
subjective scoring, which some people object to as 
costly, time consuming, and inherently unfair. As we 
have argued elsewhere (Frederiksen & Collins, 1989), 
there are well-developed methods for achieving fair- 
ness in assessing student writing, and these methods 
are applicable to records from video and computers. 
Furthermore, the limits of what we know how to 
objectively score so fundamentally misdirect the 
educational enterprise that the real costs of objective 
scoring may far outweigh the costs of instituting a 
testing system that measures a broad rangeof student 
abilities. 
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TASKS EMPLOYING 
DIFFERENT MEDIA 

We are currently trying to develop sys* 
temically valid methods of assessing 
student performance in the context of 
high school science. A key part of this 
work is to explore what kinds of tasks will enable 
students to use and demonstrate the broader range of 
abilities outlined above, and this requires very different 
kinds of tasks than are now the norm. Successful tasks 
are likely to have the folbring properties: they are 
complex enough to engage students in real thinking 
and performances; they exemplify "authentic" work 
in the disciplines; they are open-ended enough to 
encourage different approaches, but sufficiently con- 
strained to permit reliable scoring; and appropriate 
records of student abilities can be readily collected 
and compiled for assessment purposes. We can illus- 
trate the kinds of tasks that we are recommending, 
using computers and video, by describing some as- 
sessment t'~ s we have developed in the science 
project, ana also some tasks developed by other re- 
searchers. For each task, we will also suggest different 
scoring criteria that might be employed forevaluating 
the student records. 

One of the important issues in design of successful 
tasks concerns the kinc*? of records that are collected. 
These may take one or more forms, including the 
products of students' work; a finished presentation, 
performance, or verbal explanation; or aspects of 
students' thinking and problem-solving processes as 
they work on a task. Decisions about what process 
records to collect are interesting parts of our task 
development research. They might be ''snapshots" of 
key pans of the task (e.g., what configuration of 
variables does a student select for a simulation). They 
might even be continuous recordings of students' 
reflections about their work. Essential to collecting 
records for assessment is that these records are efficient 
for scoring and that they capture the most important 
aspects of the different target abilities. It is also im- 
portant that the collection of process records not have 
the undesirable systemic effect of constraining stu- 
dents' ways of working, so that they have to carry out 
tasks in a rigidly prescribed way. 

Formulating relationships between variables. In 
our science project we are collecting data using a 
computer program called Physics Explorer. Physics 



Explorer provides students with a simulation envi- 
ronment in which there are a variety of different 
models, each with a large set of associated variables 
that can be manipulated. Students conduct experi- 
ments to determine how different variables affect each 
other within a physic? J system. For example, one task 
duplicates Galileo's pendulum experiments, where 
the problem is to figure out what variables affect the 
period of motion. In a second task, the student must 
determine what variables affect the friction acting on 
a body moving through a liquid. Students might be 
evaluated in terms of the following traits: (1) how 
systematically they cons breach possible independent 
variable; (2) whether they systematically control other 
variables while they test a hypothesis; (3) whether 
they can formulate qualitative relationships between 
thei pendent variablesandthedepen^ 
and (4) whether they can formulate quantitative re- 
lationships between the independent variables and 
the dependent variables. 

Troubleshooting or diagnosing problems. An- 
other kind of task that arises in many different settings 
is diagnosing why a system is not behaving as ex- 
pected. Such problems are most common in computer 
programming, electronics, and med icine, but they can 
occur with anysystem,such as governmentor business. 
Using simulations of such systems, computers can 
provide students with a faulty version of a system, 
such as a circuit, and ask them to troubleshoot in order 
to find out why it is not doing what it is supposed to. 
Students' performances might be evaluated on such a 
task in terms ot (1) how they reason about a system's 
behavior in order to generate hypotheses about faults; 
(2) how systematically they collect data to evaluate 
their hypotheses; and (3) how consistent their hy- 
pothesis revisions are with thedata they havecoilected. 

Design. Computers provide a setting where stu- 
dents can carry out design tasks, such as designing a 
circuit, an ecosystem, or a governmental policy. The 
system can be tried out in a simulation, the effects of 
the design observed, and revisions made where ap- 
propriate. One possible task is for students to design a 
set of activities to teach younger students about 
Newton's Laws using a Dynaturtle in Logo (diSessa, 
1984; White 1984). A Dynaturtle is moved by firing 
impulses, like a rocket in outer space, so that it makes 
itpossibletoseethebduivforofanobjectinafiictionless 
environment. We might evaluate such a task in terms 
of: (1) how creative the design is; (2) how well the 



Tech. fcep. No. 12 



April 1991 



students understand the subject matter; (3) how sys- 
tematic or coherent the design is; (4) how well the 
design carries out its intended purpose; and (5) how 
polished the design is. 

Learning with feedback. With many computer- 
simulation environments i t is possible to give students 
feedback on what they have done and hints as to good 
strategies to use (Campione & Brown, 1990; Frederiksen 
& White, 1990). In such environments it is possible to 
evaluate students in terms of: (1) how much their 
performance improves during some fixed period; (2) 
how responsive they are to suggestions given them; 
(3) how much they rely on hints; and (4) their overall 
performance level on the task. 

For video, students can be assessed in the following 
kinds of tasks: 

Oral presentations. Students might be asked to 
present the results of their work on projects either to 
the teacher or the class as a whole. Such talks should 
include both a presentation portion, whereclarification 
question* are permitted, and a questioning period, 
where the students are challenged to defend their 
beliefs. Students' presentations might be judged in 
terms of: (1) depth of understanding; 2) clarity; (3) 
coherence; (4) responsiveness to questions; and (5) 
monitoring of their listeners' understanding. 

Paired explanations. This task makes it possible 
to evaluate students' ability to listen as well as to 
explain ideas. First, one student presents to another 
student an explanation of a project he or she has 
completed or a concept (e.g., gravity) he or she has 
been working on. Then the two students reverse roles. 
The students should use the blackboard or visual aids 
wherever appropriate. The explainers can be evalu- 
ated using the same criteria as for oral presentations. 
The listeners might be evaluated in terms of: (1) the 
quality of their questions: (2) their ability to summa- 
rize what the explainer has said; (3) their helpfulness 
in making the ideas dear; and (4) the appropriateness 
of their interruptions. 

Joint problem solving. Another use of video is in 
judging students' ability to work together to solve 
problems. The joint problem-solving tasks can consist 
of hands-on stiei ceexperiments, construction projects, 
textbook problems, etc. The criteria for evaluating 
student performance might change depending on the 
task, but could consist of the following kinds of 
characteristics: (1) helpfulness; (2) creativity; (3) un- 



derstanding; (4) sharing of work; and (5) monitoring 
progress toward the goal. 

The objective in developing tasks to assess stu- 
dent ability is to find tasks that represent the entire 
range of activities that are required in life. Because we 
have been concerned with assessing scientific ability, 
we have been trying to design tasks that address the 
full range of qualities it is important for scientists to 
develop. This leads to a very different kind of assess- 
ment than traditional science assessments, which test 
only for students' recall of facts, concepts, and proce- 
dures, and their ability to solve short, well-defined 
problems. 

POSSIBLf OBJECTIONS TO 
SYSTEMICAUY VALID TESTING 

There are a number of issues that critics raise 
about tlte kind of testing system we have 
proposed. These include the cost, the problem 
of cheating, and the dangers of using the sys- 
tem for surveillance, of teacher/parent prepping of 
students, and of exacerbating the difficulties of mi- 
norities in the school system. 

With respect to the cost issue, it is certainly true 
that the kind of testing proposed is much more expen- 
sive to administer. We would argue that testing by an 
outside agency should be extremely limited in any 
case, and so the high costs might have an incidental 
benefit of reducing the amount of outside, "on-de- 
mand" testing in our schools. Ideally, much of stu- 
dents' in-class effort would go into producing prod- 
ucts that they and their teachers try to evaluate. Some 
of those might eventually go into a portfolio that 
would be part of the submission to an outside testing 
agency. Costs can also be minimized by having trained 
teachers in each school conduct interviews with stu- 
dents that form part of the students' record to be 
evaluated by an outside agency. To reiterate, the real 
cost of the current testing system is its misdirection of 
education. Our view is that it should be possible to 
develop a cost-effective testing system that does not 
have perverse effects on education. 

The problem of cheating can be serious in any 
portfolio testing scheme. The problem is less severe 
with video t*ian with either written or computer 
records, since video documents real-time performance 
and it is difficult to falsify such a record. It is possible 
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to practice until the performance is quite smooth, but 
it should be possible for judges to evaluate spontane- 
ity if such a characteristic is desired for certain records. 
However, the be$ way to deal with cheating on any 
portfolio submission is to conduct an interview with 
students about the portfolio in order to verify its 
authenticity. Such an interview can probe vnto differ- 
ent aspects of the portfolio, to determine how deeply 
the student understands the topics covered in the 
portfolio. 

Some people worry that computers and videos 
will be used to maintain surveillance of students as 
part of their assessment function. For example, com- 
puter-based integrated learning systems that give 
students a sequence of tasks to work on, keep records 
of how each student does on each task and how they 
are progressing through the sequence. If a teacher is so 
inclined, it is possible to keep fairly dose track of 
students with such a system. This type of surveillance 
raises issues of privacy and motivation: Will students 
come to feel that they are constantly being watched, 
and will they feel totally constrained to do everything 
according to the rules, allowing for no inventiveness 
or exploration? We do not think that this is the most 
effective use of computers for education (Collins, in 
press; Collins, Hawkins, & Carver, in press), bv L w e 
think the best safeguard against such a danger ^ a 
portfolio system, where students decide what should 
be submitted for assessment. 

The goal of the system is to encourage prepping of 
students by teachers and parents toward legitimate 
goals of education. Obviously parents or teachers, 
who care about education and who have the skills to 
do so, may coach students more than those who do 
not. This in turn could exacerbate the problems chil- 
dren from some minority cultures have, though not 
necessarily. If minority cultures value hands-on activity 
or oral language more than abstract thinking and 
written language (Gardner, 1990), the involvement of 
media that can capture different cultural emphases 



may offset coaching differences . As a society, we need 
to encourage all minority cultures to emphasize edu- 
cation for their children, and perhaps a testing system 
that provides them areas in which to excel will make 
this emphasis easier to realize. 

There is the problem that many parents, including 
those from minority cultures, think that education 
must focus on the types of abilities currently embod- 
ied in tests. Our thesis is that there needs to be a 
fundamental change in public understanding of the 
goals of education. But such a change will only come 
very slowly, and it is likely to follow rather than 
precede any changes in the educational system (Collins, 
in press). 

CONCLUSION 

We are at the beginning of a program of 
research todemonstrate the reliability of 
an entirely new approach to assessment 
in schools. If it is viable, we would hope 
that it could be put in place in a number of schools and 
be used as an alternative form of testing for assigning 
student grades and admission to college. But the big- 
gest challenges are still to come. 

We would like to reiterate the problems mat we 
see ahead (from Frederiksen & Collins, 1989). Clearly, 
much research needs to be done to test the assump- 
tions on whichour proposal is based. Can performances 
be reliably assessed on a common scale when the 
particular tasks that testees carry out may vary? Does 
an awareness of criteria help students to improve 
performance on projects and teachers to become more 
effective in the classroom? Can a consensus be reached 
on what are appropriate criteria for different domains 
and activities? Can scoring standards be met when 
assessment is decentralized? These and other ques- 
tions are the focus of our research effort in support of 
a new, systemically valid system of educational test- 
ing. 



If) 



Ttxh'. Rep. No. 1 2 



99 



References 



Brown, J. S., Collins, A., 8c Duguid, P. (1989). Situated 
cognition and the culture of learning. Educational Re- 
searcher, 18(1), 32-41 

Campione, J. G, 8c Brown, A. L. a 990) . Guided learning and 
transfer Implications for approaches to assessment In 
N. Frederiksen, R Glaser, A. Lesgcld, k M. Shafto 
(Eds.), Diagnostic monitoring of skills and knowledge ac- 
quisition (pp. 141-172). Hillsdale, NJ: Erlbaum. 

Collins, A. (1990a). Refbnnulating testing to measme learning 
and thinking. !n N. Frederiksen, R. Glaser, A. Lesgold, 
8c M. Shafto (Eds.), Diagnostic monitoring of skills and 
knowledge acquisition (pp. 75-87). Hillsdale, NJ: Erlbaum. 

Collins, A. (1990b). Cognitive apprenticeship and instruc- 
tional technology. In L. Idol 8c B. F. Jones (Eds.), Edu- 
cational values and cognitive instruction: Implications for 
reform (pp. 119-136). Hillsdale, NJ: Erlbaum. 

Collins, A. (in press). The role of computer technology in 
restructuring schools. In K. Sheingold & M. Tucker 
(Eds.), Restructuring for learning with technology. Roch- 
ester, NY: Center for Education and the Economy. 

Collins, A., & Brown, J. S. (1988). The computer as a tool for 
learning through reflection. In H. Mandl & A. Lesgold 
(Eds.), Learning issues for intelligent tutoring systems (pp. 
1-18). New York: Springer-Verlag. 

Collins, A., Hawkins, J., 8c Carver, S. M. (in press). A cogni- 
tive apprenticeship for disadvantaged students. In B. 
Means (Ed.) Jeaching advanced skills to disadvantaged 
students. 

diSessa, A* (1982). Unlearning Aristotelian plyjics: A study 



of knowledge-based learning Cognitive Science, 6, 37-76. 
Frederiksen, J. R., k White, B. Y. 0990. Intelligent tutors as 
intelligent testers. In N. Frederiksen, R. Glaser, A. 
Lesgold, 8c M. Shafto (Eds.), DiagnosticmonitoringofskUl 
and knowledge acquisition (pp. 1-25). Hillsdale, NJ: 
Erlbaunfo 

Frederiksen, J.R.& Collins, A. (1989). A systems approach 
to educational testing. Educational Researcher, 18(9), 27- 
32. 

Frederiksen, N. (1984). The real test bias. American Psy- 
chologist, 39(3), 193-201 

Gardner, H. (1990). Assessment in context: The alternative 
to standardized testing. In B. Gif ford 8c C. O'Connor 
(Eds.), Future assessments: Changing views of aptitude, 
achievement, and instruction. Boston: Kluwer . 

Schoenfeld, A. H. (in press). On mathematics as sense- 
making: An informal attack on the unfortunate divorce 
of formal and informal mathematics. In D. N. Perkins, 
J. Segal, & J. Voss (Eds.), Informal reasoningandediication. 
Hillsdale, NJ: Erlbaum. 

Schoenfeld, A. H. (1985). Mathematical problem soloing. Or- 
lando, FL Academic Press. 

White, B. Y. (1984). Designing computer activities to help 
physics students understand Newton's laws of motion. 
Cognition and Instruction, 1, 69-108. 

Wiggins, G. (1989, May)* A true test- Toward more authentic 
and equitable assessment. Phi Delta Kappan, 703-713. 

Zuboff, S. (1988). In the age of the smart machine: The future of 
work and power. New York: Basic Books. 



ii 



